Goto

Collaborating Authors

 iterative regularization


Beyond Tikhonov: faster learning with self-concordant losses, via iterative regularization

Neural Information Processing Systems

The theory of spectral filtering is a remarkable tool to understand the statistical properties of learning with kernels. For least squares, it allows to derive various regularization schemes that yield faster convergence rates of the excess risk than with Tikhonov regularization. This is typically achieved by leveraging classical assumptions called source and capacity conditions, which characterize the difficulty of the learning task. In order to understand estimators derived from other loss functions, Marteau-Ferey et al. have extended the theory of Tikhonov regularization to generalized self concordant loss functions (GSC), which contain, e.g., the logistic loss. In this paper, we go a step further and show that fast and optimal rates can be achieved for GSC by using the iterated Tikhonov regularization scheme, which is intrinsically related to the proximal point method in optimization, and overcomes the limitation of the classical Tikhonov regularization.


Beyond Tikhonov: faster learning with self-concordant losses, via iterative regularization

Neural Information Processing Systems

The theory of spectral filtering is a remarkable tool to understand the statistical properties of learning with kernels. For least squares, it allows to derive various regularization schemes that yield faster convergence rates of the excess risk than with Tikhonov regularization. This is typically achieved by leveraging classical assumptions called source and capacity conditions, which characterize the difficulty of the learning task. In order to understand estimators derived from other loss functions, Marteau-Ferey et al. have extended the theory of Tikhonov regularization to generalized self concordant loss functions (GSC), which contain, e.g., the logistic loss. In this paper, we go a step further and show that fast and optimal rates can be achieved for GSC by using the iterated Tikhonov regularization scheme, which is intrinsically related to the proximal point method in optimization, and overcomes the limitation of the classical Tikhonov regularization.


Iterative regularization in classification via hinge loss diagonal descent

Apidopoulos, Vassilis, Poggio, Tomaso, Rosasco, Lorenzo, Villa, Silvia

arXiv.org Artificial Intelligence

Estimating a quantity of interest from finite measurements is a central problem in a number of fields including machine learning but also statistics and signal processing. In this context, a key idea is that reliable estimation requires imposing some prior assumptions on the problem at hand. The theory of inverse problems provides a principled framework to formalize this idea [27]. The quantity of interest is typically seen as a function, or a vector, and prior assumptions take the form of suitable functionals, called regularizers. Following this idea, Tikhonov regularization provides a classic approach to estimate solutions [83, 84]. Indeed, the latter are found by minimizing an empirical objective where a data fit term is penalized adding the chosen regularizer. Other regularization approaches are classic in inverse problems, and in particular iterative regularization has become popular in machine learning, see e.g.


Deep unfolding as iterative regularization for imaging inverse problems

Cui, Zhuo-Xu, Zhu, Qingyong, Cheng, Jing, Liang, Dong

arXiv.org Artificial Intelligence

Recently, deep unfolding methods that guide the design of deep neural networks (DNNs) through iterative algorithms have received increasing attention in the field of inverse problems. Unlike general end-to-end DNNs, unfolding methods have better interpretability and performance. However, to our knowledge, their accuracy and stability in solving inverse problems cannot be fully guaranteed. To bridge this gap, we modified the training procedure and proved that the unfolding method is an iterative regularization method. More precisely, we jointly learn a convex penalty function adversarially by an input-convex neural network (ICNN) to characterize the distance to a real data manifold and train a DNN unfolded from the proximal gradient descent algorithm with this learned penalty. Suppose the real data manifold intersects the inverse problem solutions with only the unique real solution. We prove that the unfolded DNN will converge to it stably. Furthermore, we demonstrate with an example of MRI reconstruction that the proposed method outperforms conventional unfolding methods and traditional regularization methods in terms of reconstruction quality, stability and convergence speed.


Iterative regularization for low complexity regularizers

Molinari, Cesare, Massias, Mathurin, Rosasco, Lorenzo, Villa, Silvia

arXiv.org Machine Learning

Iterative regularization exploits the implicit bias of an optimization algorithm to regularize ill-posed problems. Constructing algorithms with such built-in regularization mechanisms is a classic challenge in inverse problems but also in modern machine learning, where it provides both a new perspective on algorithms analysis, and significant speed-ups compared to explicit regularization. In this work, we propose and study the first iterative regularization procedure able to handle biases described by non smooth and non strongly convex functionals, prominent in low-complexity regularization. Our approach is based on a primal-dual algorithm of which we analyze convergence and stability properties, even in the case where the original problem is unfeasible. The general results are illustrated considering the special case of sparse recovery with the $\ell_1$ penalty. Our theoretical results are complemented by experiments showing the computational benefits of our approach.


Iterative regularization for convex regularizers

Molinari, Cesare, Massias, Mathurin, Rosasco, Lorenzo, Villa, Silvia

arXiv.org Machine Learning

Machine learning often reduces to estimating some model parameters. This approach raises at least two orders of questions: first, multiple solutions may exist, amongst which a specific one must be selected; second, potential instabilities with respect to noise and sampling must be controlled. A classical way to achieve both goals is to consider explicitly penalized or constrained objective functions. In machine learning, this leads to regularized empirical risk minimization (Shalev-Shwartz and Ben-David, 2014). A more recent approach is based on directly exploiting an iterative optimization procedure for an unconstrained/unpenalized problem. This approach is shared by several related ideas. One is implicit regularization (Mahoney, 2012; Gunasekar et al., 2017), stemming from the observation that the bias is controlled increasing the number of iterations, just like in penalized methods it is controlled decreasing the penalty parameter. Another one is early stopping (Yao et al., 2007; Raskutti et al., 2014), putting emphasis on the fact that running the iterates to convergence might lead to instabilities in the presence of noise. Yet another, and more classical, idea is iterative regularization, where both aspects (convergence and stability) are considered to be relevant (Engl et al., 1996; Kaltenbacher et al., 2008).


Iterative Regularization for Learning with Convex Loss Functions

Lin, Junhong, Rosasco, Lorenzo, Zhou, Ding-Xuan

arXiv.org Machine Learning

We consider the problem of supervised learning with convex loss functions and propose a new form of iterative regularization based on the subgradient method. Unlike other regularization approaches, in iterative regularization no constraint or penalization is considered, and generalization is achieved by (early) stopping an empirical iteration. We consider a nonparametric setting, in the framework of reproducing kernel Hilbert spaces, and prove finite sample bounds on the excess risk under general regularity conditions. Our study provides a new class of efficient regularized learning algorithms and gives insights on the interplay between statistics and optimization in machine learning.